# Mixture of Experts Architecture
Bytedance BAGEL 7B MoT INT8
Apache-2.0
BAGEL is an open-source 7B active parameter multimodal foundation model supporting multimodal understanding and generation tasks
Text-to-Image
B
Gapeleon
190
20
BAGEL 7B MoT
Apache-2.0
BAGEL is an open-source, 7-billion-parameter active multimodal foundation model trained on large-scale interleaved multimodal data, excelling in both understanding and generation tasks.
Text-to-Image
B
ByteDance-Seed
4,736
769
Qwen3 1.7B GGUF
Apache-2.0
Qwen3 is the latest version of the Tongyi Qianwen series of large language models, offering a range of dense and mixture of experts (MoE) models. Based on large-scale training, Qwen3 has achieved breakthrough progress in reasoning, instruction following, agent capabilities, and multilingual support.
Large Language Model English
Q
prithivMLmods
357
1
Ling Lite 1.5
MIT
Ling is a large-scale Mixture of Experts (MoE) language model open-sourced by InclusionAI. The Lite version features 16.8 billion total parameters with 2.75 billion activated parameters, demonstrating exceptional performance.
Large Language Model
Transformers

L
inclusionAI
46
3
Qwen3 30B A7.5B 24 Grand Brainstorm
A fine-tuned version based on the Qwen3-30B-A3B Mixture of Experts model, increasing the number of active experts from 8 to 24, suitable for complex tasks requiring deep reasoning
Large Language Model
Transformers

Q
DavidAU
55
7
Qwen3 30B A6B 16 Extreme 128k Context
A fine-tuned version of the Qwen3-30B-A3B mixture of experts model, with activated experts increased to 16 and context window expanded to 128k, suitable for complex reasoning scenarios
Large Language Model
Transformers

Q
DavidAU
72
7
Qwen3 30B A1.5B High Speed
An optimized high-speed version of Qwen3-30B, achieving doubled inference speed by reducing activated experts, suitable for text generation scenarios requiring rapid responses
Large Language Model
Transformers

Q
DavidAU
179
7
Qwen3 235B A22B AWQ
Apache-2.0
Qwen3-235B-A22B is the latest generation large language model in the Qwen series, adopting a Mixture of Experts (MoE) architecture with 235 billion parameters and 22 billion active parameters. It excels in reasoning, instruction following, agent capabilities, and multilingual support.
Large Language Model
Transformers

Q
cognitivecomputations
2,563
9
Nomic Embed Text V2 Moe GGUF
Apache-2.0
A multilingual mixture of experts text embedding model that supports approximately 100 languages and performs excellently in multilingual retrieval.
Text Embedding Supports Multiple Languages
N
nomic-ai
14.06k
13
Nomic Embed Text V2 GGUF
Apache-2.0
Nomic Embed Text V2 GGUF is a multilingual text embedding model supporting over 70 languages, suitable for sentence similarity calculation and feature extraction tasks.
Text Embedding Supports Multiple Languages
N
ggml-org
317
3
Qwen3 235B A22B GGUF
MIT
Qwen3-235B-A22B is a 235-billion-parameter large language model that has undergone advanced non-linear quantization processing via the ik_llama.cpp branch, suitable for high-performance computing environments.
Large Language Model
Q
ubergarm
889
16
Qwen3 235B A22B
Apache-2.0
Qwen3 is the latest version of the Tongyi Qianwen series of large language models, offering a complete suite of dense models and Mixture of Experts (MoE) models, achieving breakthrough progress in reasoning, instruction following, agent capabilities, and multilingual support.
Large Language Model
Transformers

Q
Qwen
159.10k
849
MAI DS R1 GGUF
MIT
MAI-DS-R1 is the DeepSeek-R1 inference model, further trained by Microsoft's AI team to enhance its responsiveness on restricted topics and optimize its risk performance while maintaining its reasoning capabilities and competitive performance.
Large Language Model
M
unsloth
916
4
Llama3.1 MOE 4X8B Gated IQ Multi Tier COGITO Deep Reasoning 32B GGUF
Apache-2.0
A Mixture of Experts (MoE) model with adjustable reasoning capabilities, enhancing inference and text generation through collaboration of four 8B models
Large Language Model Supports Multiple Languages
L
DavidAU
829
2
MAI DS R1
MIT
MAI-DS-R1 is the result of Microsoft AI team's post-training of the DeepSeek-R1 inference model, aimed at enhancing its response capability to sensitive topics and optimizing risk performance, while maintaining the original reasoning ability and competitive advantages.
Large Language Model
Transformers

M
microsoft
8,840
250
Llama 4 Scout 17B 16E Linearized Bnb Nf4 Bf16
Other
Llama 4 Scout is a 17-billion-parameter Mixture of Experts (MoE) model released by Meta, supporting multilingual text and image understanding with a linearized expert module design for PEFT/LoRA compatibility.
Multimodal Fusion
Transformers Supports Multiple Languages

L
axolotl-quants
6,861
3
Llama 4 Scout 17B 16E Unsloth
Other
Llama 4 Scout is a 17-billion-parameter multimodal AI model developed by Meta, featuring a Mixture of Experts architecture with support for 12 languages and image understanding.
Text-to-Image
Transformers Supports Multiple Languages

L
unsloth
67
1
Doge 120M MoE Instruct
Apache-2.0
The Doge model employs dynamic masked attention mechanisms for sequence transformation and can use multi-layer perceptrons or cross-domain mixture of experts for state transitions.
Large Language Model
Transformers English

D
SmallDoge
240
1
Llama 4 Maverick 17B 128E
Other
Llama 4 Maverick is a multimodal AI model developed by Meta, utilizing a Mixture of Experts architecture, supporting text and image understanding, with 17 billion active parameters and 400 billion total parameters.
Text-to-Image
Transformers Supports Multiple Languages

L
meta-llama
3,261
69
Llama 4 Maverick 17B 128E Instruct
Other
Llama 4 Maverick is a 17-billion-parameter multimodal AI model developed by Meta, featuring a Mixture of Experts (MoE) architecture, supporting multilingual text and image understanding with 128 expert modules.
Large Language Model
Transformers Supports Multiple Languages

L
meta-llama
87.79k
309
Deepseek V3 0324 GGUF
MIT
DeepSeek-V3-0324 is the March update version released by the DeepSeek team, showing significant improvements over the previous generation in multiple benchmarks, supporting dynamic quantization versions, suitable for local inference.
Large Language Model English
D
unsloth
108.44k
177
Llm Jp 3 8x13b Instruct3
Apache-2.0
A large-scale Japanese-English hybrid MoE language model developed by Japan's National Institute of Informatics, supporting an 8x13B parameter scale with instruction fine-tuning optimization
Large Language Model
Transformers Supports Multiple Languages

L
llm-jp
162
3
Qwen2.5 MOE 2X1.5B DeepSeek Uncensored Censored 4B Gguf
Apache-2.0
This is a Qwen2.5 MOE (Mixture of Experts) model, composed of two Qwen 2.5 DeepSeek (censored/regular and uncensored) 1.5B models, forming a 4B model where the uncensored version of DeepSeek Qwen 2.5 1.5B dominates the model's behavior.
Large Language Model Supports Multiple Languages
Q
DavidAU
678
5
Hiber Multi 10B Instruct
Hiber-Multi-10B-Instruct is an advanced multilingual large language model based on Transformer architecture, supporting multiple languages with 10 billion parameters, suitable for text generation tasks.
Large Language Model
Transformers Supports Multiple Languages

H
Hibernates
86
2
Nomic Embed Text V2 Moe Unsupervised
This is an intermediate version of a multilingual Mixture of Experts (MoE) text embedding model, obtained through multi-stage contrastive training
Text Embedding
N
nomic-ai
161
5
Nomic Embed Text V2 Moe
Apache-2.0
Nomic Embed v2 is a high-performance multilingual Mixture of Experts (MoE) text embedding model supporting approximately 100 languages, excelling in multilingual retrieval tasks.
Text Embedding Supports Multiple Languages
N
nomic-ai
242.32k
357
Deepseek R1
MIT
DeepSeek-R1 is the first-generation inference model launched by DeepSeek. Through large-scale reinforcement learning training, it performs excellently in mathematics, code, and reasoning tasks.
Large Language Model
Transformers

D
deepseek-ai
1.7M
12.03k
Falcon3 MoE 2x7B Insruct
Other
Falcon3 7B-IT and 7B-IT Mixture of Experts model with 13.4 billion parameters, supporting English, French, Spanish, and Portuguese, with a context length of up to 32K.
Large Language Model
Safetensors English
F
ehristoforu
273
10
Llama 3.2 4X3B MOE Ultra Instruct 10B GGUF
Apache-2.0
A Mixture of Experts model based on Llama 3.2, integrating four 3B models to form a 10B parameter model, supporting 128k context length, excelling in instruction following and full-scenario generation.
Large Language Model English
L
DavidAU
277
7
Timemoe 200M
Apache-2.0
TimeMoE-200M is a billion-scale time series foundation model based on the Mixture of Experts (MoE) architecture, focusing on time series forecasting tasks.
Climate Model
T
Maple728
14.01k
7
Chartmoe
Apache-2.0
ChartMoE is a multimodal large language model based on InternLM-XComposer2, featuring a mixture of experts connector with advanced chart capabilities.
Image-to-Text
Transformers

C
IDEA-FinAI
250
12
Deepseek V2 Lite
DeepSeek-V2-Lite is a cost-efficient Mixture of Experts (MoE) language model with a total of 16B parameters and 2.4B active parameters, supporting a 32k context length.
Large Language Model
Transformers

D
ZZichen
20
1
Mixtral 8x22B V0.1 GGUF
Apache-2.0
Mixtral 8x22B is a 176-billion-parameter mixture of experts model released by MistralAI, supporting multilingual text generation tasks.
Large Language Model Supports Multiple Languages
M
MaziyarPanahi
170.27k
74
Dbrx Instruct
Other
A Mixture of Experts (MoE) large language model developed by Databricks, specialized in few-turn interaction scenarios
Large Language Model
Transformers

D
databricks
5,005
1,112
Dbrx Base
Other
A Mixture of Experts (MoE) large language model developed by Databricks, with 132 billion total parameters and 36 billion active parameters, supporting a 32K context window
Large Language Model
Transformers

D
databricks
100
557
Xlam V0.1 R
xLAM-v0.1 is a major upgrade in the Large Action Model series, fine-tuned across a wide range of agent tasks and scenarios while maintaining the original model's capabilities with the same parameter count.
Large Language Model
Transformers

X
Salesforce
190
53
Openbuddy Mixtral 7bx8 V18.1 32k GGUF
Apache-2.0
OpenBuddy is an open multilingual chatbot model based on the Mixtral-8x7B architecture, suitable for multilingual dialogue scenarios.
Large Language Model Supports Multiple Languages
O
nold
79
2
Moe LLaVA Qwen 1.8B 4e
Apache-2.0
MoE-LLaVA is a large vision-language model based on the Mixture of Experts architecture, achieving efficient multimodal learning through sparse activation parameters
Text-to-Image
Transformers

M
LanguageBind
176
14
Discolm Mixtral 8x7b V2
Apache-2.0
Experimental 8x7b Mixture of Experts model developed based on Mistral AI's Mixtral 8x7b, fine-tuned on Synthia, MetaMathQA, and Capybara datasets
Large Language Model
Transformers English

D
DiscoResearch
205
124
Mixtral 7b 8expert
Apache-2.0
The latest Mixture of Experts (MoE) model released by MistralAI, supporting multilingual text generation tasks
Large Language Model
Transformers Supports Multiple Languages

M
DiscoResearch
57.47k
264
- 1
- 2
Featured Recommended AI Models